5.2 Insights from Twitter/X Data
5.2.1 Reach Distribution by Topics
lda.model <- LDA(myDTM, 10, method='Gibbs', control=list(seed=2022))
topic_matrix <- terms(lda.model,10)
topic_matrix## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## [1,] "time" "tatum" "brown" "game" "win"
## [2,] "will" "jayson" "jaylen" "tonight" "get"
## [3,] "thank" "mvp" "trade" "bleedgreen" "bleedgreen"
## [4,] "mazzulla" "embiid" "via" "night" "now"
## [5,] "joe" "kevin" "per" "heat" "let"
## [6,] "coach" "top" "star" "unfinishedbusiness" "don"
## [7,] "tell" "giannis" "says" "finals" "big"
## [8,] "efforts" "jimmy" "without" "series" "new"
## [9,] "sources" "butler" "lillard" "business" "need"
## [10,] "years" "joel" "jordan" "miami" "today"
## Topic 6 Topic 7 Topic 8 Topic 9 Topic 10
## [1,] "can" "just" "team" "back" "season"
## [2,] "like" "got" "year" "horford" "points"
## [3,] "good" "love" "one" "smart" "pts"
## [4,] "play" "fans" "best" "williams" "games"
## [5,] "going" "see" "player" "marcus" "first"
## [6,] "said" "know" "two" "left" "last"
## [7,] "great" "even" "still" "white" "reb"
## [8,] "think" "live" "league" "right" "ast"
## [9,] "better" "fan" "next" "brogdon" "point"
## [10,] "defense" "people" "way" "derrick" "career"
## document topic
## 1 1 7
## 2 2 6
## 3 3 1, 5, 7, 10
## 4 4 6
## 5 5 1, 10
## 6 6 6
5.2.1.1 Topic Categories:
Utilizing the Latent Dirichlet Allocation (LDA) unsupervised topic modeling method (Chen 2011), an analysis of the Celtics-related conversations yielded 10 salient topics (see above results and below summary). Among these topics, our special focus is directed towards conversations centered on fan engagement topics (i.e., topic category 4).
This topic category likely encapsulates discussions that are directly relevant to increasing user engagement and fostering a strong fan community. By focusing on this category, efforts can be more effectively channeled towards strategies that resonate with the fans’ interests and preferences, thereby amplifying engagement and participation in online discussions about the franchise.
Category 1: Star Player
Topic 2: Centers around Jayson Tatum and related MVP title discussion
Topic 3: Jaylen Brown and potential trade rumors
Category 2: Game
Topic 1: Timing of games/events
Topic 6: Strategies/opinions on games
Category 3: Franchise Management
Topic 8: Team Management
Topic 9: Other players
Category 4: Fan Engagement
Topic 4: Celtics slogan
Topic 5: Appeal for wining and achievements
Topic 7: Fans community
Topic 10: Season statistics & Career achievements
5.2.1.2 Topic & Tweets Daily Reach
A visualized plot was created to analyze the daily reach of tweets under each topic category over the course of a year (see figure 5.4). This visualization helps in understanding how the reach of different conversation topics fluctuates over time. Additionally, to understand whether conversations related to fan engagement potentially achieve a higher level of daily reach, a one-sided two-sample t-test was employed.
However, both from visual inspection (the “eye-ball” method) and the statistical results, it appears that conversations centered around fan engagement do not demonstrate a significantly different level of daily reach (= 24458.26) or a distinct pattern in the distribution of their reach.
tt<-ggplot(data_1,aes(x=Day,y=Daily_Reach,group=as.factor(topic),color=as.factor(topic))) +
geom_line() + geom_point() +
labs(title = "Figure 5.4: Topic Rearch Transition", x = "Day", y = "Daily Reach") +
theme_minimal()
t <- ggplotly(tt)
t##
## Welch Two Sample t-test
##
## data: fan$Daily_Reach and other_topic$Daily_Reach
## t = 0.60437, df = 11007, p-value = 0.2728
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -647.1339 Inf
## sample estimates:
## mean of x mean of y
## 24458.26 24082.42
5.2.2 Reach Distribution by Sentiment
Additionally, employing the “Syuzhet” package in R, a sentiment analysis was performed on the sample data.
A visualized plot was then created to track the daily reach transitions of tweets categorized by their sentiments (see figure 5.5). On the other hand, to delve deeper into the changes in daily reach among conversations with different sentiments, simple linear regression was utilized.
Reflected in both figure 5.5 and the regression results, there’s a overall decreasing trend in the daily reach of tweets with positive sentiments. Specifically, each additional day is associated with a decrease of 144 daily reach, indicating a significant negative trend over time for tweets with positive sentiment. Meanwhile, for tweets with negative sentiment, daily reach also decreases by about 164 with each day.
gg<-ggplot(data,aes(x=Day,y=Daily_Reach,group=sentiment,color=sentiment)) +
geom_line() + geom_point() +
labs(title = "Figure 5.5: Sentiment Rearch Transition", x = "Day", y = "Daily Reach") +
theme_minimal()
p <- ggplotly(gg)
preg_1<-lm(Daily_Reach~Day,positive_sentiment)
reg_2<-lm(Daily_Reach~Day,negative_sentiment)
stargazer(reg_1,reg_2,type="text",star.cutoffs=c(.05,.01,.001))##
## =======================================================================
## Dependent variable:
## ---------------------------------------------------
## Daily_Reach
## (1) (2)
## -----------------------------------------------------------------------
## Day -144.021*** -164.246***
## (5.797) (8.748)
##
## Constant 2,823,105.000*** 3,216,784.000***
## (112,601.000) (169,943.400)
##
## -----------------------------------------------------------------------
## Observations 6,020 2,609
## R2 0.093 0.119
## Adjusted R2 0.093 0.119
## Residual Std. Error 33,668.440 (df = 6018) 32,533.770 (df = 2607)
## F Statistic 617.201*** (df = 1; 6018) 352.500*** (df = 1; 2607)
## =======================================================================
## Note: *p<0.05; **p<0.01; ***p<0.001

We offer two potential explanations of these results:
Waning Novelty: At the beginning of the season, the new start and new possibilities often stimulate interest and positive sentiment. Over time, this novelty may fade.
late-season Performance Decline & Adjustment of Expectations: fans’ expectations of the team diminishes, especially given the team fails to meet those expectations in the last season, which might leads to decrease interest in online discussions about Celtics.
However, compare to our competitor, LA Lakers, for both positive and negative conversations on Twitter/X, tweets about Lakers generally decrease about 100 less reach each day compared to Celtics’s. As the first explanation can be also applied to the situations faced by Lakers and other NBA franchises, team’s disappointing performances might have a greater influences on fans’ online engagement frequency than expected, especially given Celtics’ generations’ connections to the city’s sport culture.
In the specific case of the Celtics, the unexpected defeat in the in-season tournament in the past two weeks, particularly when the performance of its star player was below expectations in the 1/8 Finals, could have exacerbated the decrease in online engagement in the late 2024.
## [1] 1 0 -1
reg_1la<-lm(Daily_Reach~Day,positive_sentimentla)
reg_2la<-lm(Daily_Reach~Day,negative_sentimentla)
stargazer(reg_1la,reg_2la,type="text",star.cutoffs=c(.05,.01,.001))##
## ======================================================================
## Dependent variable:
## --------------------------------------------------
## Daily_Reach
## (1) (2)
## ----------------------------------------------------------------------
## Day -39.990*** -32.280***
## (3.659) (5.468)
##
## Constant 784,277.100*** 634,972.200***
## (71,097.650) (106,215.800)
##
## ----------------------------------------------------------------------
## Observations 7,158 2,820
## R2 0.016 0.012
## Adjusted R2 0.016 0.012
## Residual Std. Error 21,511.410 (df = 7156) 20,387.020 (df = 2818)
## F Statistic 119.431*** (df = 1; 7156) 34.856*** (df = 1; 2818)
## ======================================================================
## Note: *p<0.05; **p<0.01; ***p<0.001
5.2.3 Correlation Analysis to Understand Reach Matrics
To gain a deeper understanding of the metrics associated with stimulating audiences’ and fans’ overall reach, a correlational matrix was created. This matrix displays the correlations between reach and other observable social media metrics.
As can be directly observed in the matrix, overall:
Sentiment Surprisingly, sentiments scores have almost no correlations with any other observable social media metrics.
Overall Reach, Likes, and Replies: Likes and replies amount are both highly positively associated with audineces’ overall reach on Twitter/X.
Therefore, two specific actionable insights were determined to meet the one of the given objectives (i.e., Boost overall reach):
Maximize Likes to Boost Reach: As tweets that receive more likes also get a higher daily reach. We will focus on creating content that is more likely to be liked by the audience.
Encourage Replies for Greater Interaction: Since replies are associated with higher daily reach, the campaign should encourage fans and franchise supporters to more actively interact and participate in conversations around the franchise.
melted_cormat <- melt(cor_matrix)
ggplot(data = melted_cormat, aes(x=Var1, y=Var2)) +
geom_tile(aes(fill=value), color='white') +
scale_fill_gradient2(low='blue', high='red', mid='grey', midpoint=0, limit=c(-1,1), space='Lab', name='Correlation') +
theme_minimal() +
theme(axis.text.x=element_text(angle=45, vjust=1, size=12, hjust=1),
axis.text.y=element_text(size=12)) +
coord_fixed()
5.2.3.1 Distribution of Reach, Likes, & Replies by Months
Given the actionable insights, box plots and density plot were utilized to understand the distributions of likes and replies by months and distribution of reach by months.
The peak in replies occurred from February to May 2023 (the regular season period after the All-Star weekend, see figure 5.7). The peak in likes was observed from May to July 2023 (from mid-NBA playoffs to the end of the NBA season, see figure 5.8), and the peak in monthly reach occurred from May to August 2023 (also covering the mid-NBA playoffs to the end of the NBA season, see figure 5.9). Given these results, the peak in replies might play a predictive role in leading to the peaks in likes and overall reach.
ph_1 <- ggplot(data = replypart, aes(x = month_factor, y = Reply, fill = month_factor)) +
geom_boxplot() +
ggtitle("Figure 5.7: Replies By Months") +
theme_minimal() +
scale_fill_discrete(name = "Month") +
xlab("Month") +
ylab("Replies")
ph_2 <- ggplot(data = likepart, aes(x = month_factor, y = Likes, fill = month_factor)) +
geom_boxplot() +
ggtitle("Figure 5.8: Like By Months") +
theme_minimal() +
scale_fill_discrete(name = "Month") +
xlab("Month") +
ylab("Likes") +
scale_y_continuous(limits = c(0, 20), oob = scales::squish)
grid.arrange(ph_1, ph_2, ncol = 2)
fan_engage$month <- substr(fan_engage$Day, 1, 7)
Densityplot <- ggplot(fan_engage, aes(x = Daily_Reach)) +
geom_density(aes(fill = month), alpha = 0.4) +
geom_vline(aes(xintercept = mean(Daily_Reach)), linetype = "dashed", color = "red") +
ggtitle("Figure 5.9: Density Plot for Reach Distribution by Month") +
xlab("Reach Number") +
ylab("Density") +
theme_minimal()
print(Densityplot)
5.2.4 A 2024 Customized Pre-Playoff Twitter/X Campaign
Given the above insights, we aim to launch a customized campaign on Twitter/X. This campaign intends to capitalize on the increased engagement before the 2024 playoffs to build momentum through running a series of interactive Twitter/X campaigns that encouraging replies.
Given the peak time range of reply metrics in the last season, which mainly occurred at the remaining regular season period since the All-Star weekend, this campaign will start from the week after All-Star and end before the 2024 NBA playoffs.